Factor-Based Uyghur-Chinese Statistical Machine Translation

نویسندگان

  • Xinghua Dong
  • Huajian Xue
  • Yong Yang
چکیده

This paper is an initial explore to Uyghur-Chinese statistical machine translation. Uyghur and Chinese are very different from each other, the former is an agglutinative language with very productive inflectional and derivational word-formation processes, but the characters of the latter are almost hieroglyphics, morpheme processing doesn’t work at all. We integrate Uyghur additional information, such as, affixes, stems to statistical machine translation system, this is so-called factored model, which is an extension of the phrase-based approach. The experiments show that morphological strategies can effectively improve the performances of translation system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Phrase Table Filtering Model Based on Binary Classification for Uyghur-Chinese Machine Translation

In statistical machine translation, large amount of unreasonable phrase pairs in a phrase table can affect the decoding efficiency and the overall translation performance, especially in Uyghur-Chinese machine translation. In this paper, we present a novel phrase table filtering model based on binary classification, which consider differences between Uyghur and Chinese, and draw lessons from bin...

متن کامل

Research for Uyghur-Chinese Neural Machine Translation

The problem of rare and unknown words is an important issue in Uyghur-Chinese machine translation, especially using neural machine translation model. We propose a novel way to deal with the rare and unknown words. Based on neural machine translation of using pointers over input sequence, our approach which consists of preprocess and post-process can be used in all neural machine translation mod...

متن کامل

Uyghur-Chinese Translation Disambiguation Method Research Based on Knowledge Automatic-Acquisition

This thesis studies the disambiguation method in Uyghur-Chinese translation, and proposes the design philosophy of automatic-acquisition in translation label library aiming at the deficiency of disambiguation corpus in Uyghur. It refers to the existing Uyghur-Chinese bilingual dictionary, Chinese corpus and the Internet, and acquires the corresponding Chinese translation label examples to Uyghu...

متن کامل

Chinese-Uyghur Sentence Alignment: An Approach Based on Anchor Sentences

This paper, which builds on previous studies on sentence alignment, introduces a sentence alignment method in which some sentences are used as “anchors” and a two step procedure is applied. In the first step, some lexical information such as proper names, technical terms, numbers and punctuation marks, location information and length information are used to generate anchor sentences that satisf...

متن کامل

Rule Based Analysis of the Uyghur Nouns

This paper describes the implementation of a rule-based analyzer for Uyghur (spoken in Sin Kiang, China) Nouns. We hope this paper will give some contribution for advanced studies to the Uyghur Language in Machine Translation and Natural Language Processing. Like all Turkic languages, the Uyghur Language is an agglutinative language that has productive inflectional and derivational suffixes. In...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012